Methods for Large-Scale Mining of Networks of Human Genes

نویسندگان

  • Tor-Kristian Jenssen
  • Lisa M. J. Öberg
  • Magnus L. Andersson
  • Jan Komorowski
چکیده

In molecular biology there is much interest in various types of relationships between genes. Due to the complexity and rapid development of this field, much of this knowledge exists only in free-text form. A database of relationships between genes may allow background knowledge to be used in computerised analyses. As far as we know, no comprehensive manually cured database of this kind exists, and constructing and maintaining such a database manually would be very labour-intensive. Efficient automated methods for extraction and structuring of relationships between genes from free-text would be valuable. A database named PubGene has previously been created and it contains a comprehensive network of human genes created by automated extraction of co-occurrence of gene terms in over 10 million MEDLINE records. Co-occurring genes were linked together under the hypothesis that two genes will co-occur only if they have some biological relationship. In this paper, we show that for the subset of human genes encoding enzymes, pairs of co-occurring enzyme genes are significantly more closely related biologically than when these genes are compared randomly. Manual inspection, however, shows that some of the links in PubGene are not correct and it also indicates how the noise can be reduced. We propose a complementary method for automated extraction of relationships between genes by use of information from the Science Citation Index (SCI) database. We relate two genes if they have been co-referred, that is, having reference articles being co-cited in a third article. The alternative approach confirms relationships found in PubGene, and it also finds other relevant relationships. Although further experiments are 1 Knowledge Systems Group, Department of Computer and Information Science, Norwegian University of Science and Technology, N-7491 Trondheim, Norway. [email protected], [email protected]. 2 Molecular Biology, AstraZeneca R&D Mölndal, S-431 83 Mölndal, Sweden. [email protected], [email protected]. † These authors contributed equally

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining

Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...

متن کامل

Community Detection using a New Node Scoring and Synchronous Label Updating of Boundary Nodes in Social Networks

Community structure is vital to discover the important structures and potential property of complex networks. In recent years, the increasing quality of local community detection approaches has become a hot spot in the study of complex network due to the advantages of linear time complexity and applicable for large-scale networks. However, there are many shortcomings in these methods such as in...

متن کامل

A hybrid model based on machine learning and genetic algorithm for detecting fraud in financial statements

Financial statement fraud has increasingly become a serious problem for business, government, and investors. In fact, this threatens the reliability of capital markets, corporate heads, and even the audit profession. Auditors in particular face their apparent inability to detect large-scale fraud, and there are various ways to identify this problem. In order to identify this problem, the majori...

متن کامل

A new conforming mesh generator for three-dimensional discrete fracture networks

Nowadays, numerical modelings play a key role in analyzing hydraulic problems in fractured rock media. The discrete fracture network model is one of the most used numerical models to simulate the geometrical structure of a rock-mass. In such media, discontinuities are considered as discrete paths for fluid flow through the rock-mass while its matrix is assumed impermeable. There are two main pa...

متن کامل

LPKP: location-based probabilistic key pre-distribution scheme for large-scale wireless sensor networks using graph coloring

Communication security of wireless sensor networks is achieved using cryptographic keys assigned to the nodes. Due to resource constraints in such networks, random key pre-distribution schemes are of high interest. Although in most of these schemes no location information is considered, there are scenarios that location information can be obtained by nodes after their deployment. In this paper,...

متن کامل

Target Tracking Based on Virtual Grid in Wireless Sensor Networks

One of the most important and typical application of wireless sensor networks (WSNs) is target tracking. Although target tracking, can provide benefits for large-scale WSNs and organize them into clusters but tracking a moving target in cluster-based WSNs suffers a boundary problem. The main goal of this paper was to introduce an efficient and novel mobility management protocol namely Target Tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001